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Genes containing multiple pre-mRNA cleavage and polyadenylation sites, or polyA sites, express 
mRNA isoforms with variable 3' untranslated regions (UTRs). By systematic analysis of human and 
mouse transcriptomes, we found that short 3'UTR isoforms are relatively more abundant when 
genes are highly expressed whereas long 3'UTR isoforms are relatively more abundant when genes 
are lowly expressed. Reporter assays indicated that poly A site choice can be modulated by 
transcriptional activity through the gene promoter. Using global and reporter-based nuclear run-on 
assays, we found that RNA polymerase II is more likely to pause at the polyA site of highly expressed 
genes than that of lowly expressed ones. Moreover, highly expressed genes tend to have a lower level 
of nucleosome but higher H3K4me3 and H3K36me3 levels at promoter-proximal polyA sites relative 
to distal ones. Taken together, our results indicate that polyA site usage is generally coupled to 
transcriptional activity, leading to regulation of alternative polyadenylation by transcription. 
Molecular Systems Biology 7: 534; published online 27 September 2011; doi:10.1038/msb.2011.69 
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Introduction 

Expression of protein-coding genes in eukaryotes involves 
multiple transcriptional and post-transcriptional processes, 
which are increasingly found to be interconnected (Maniatis 
and Reed, 2002; Moore and Proudfoot, 2009). The 3' end 
processing of pre-mRNAs, involving cleavage of nascent RNAs 
and synthesis of the poly (A) tail (Colgan and Manley, 1997), is 
critical for termination of transcription and interplays with 
pre-mRNA splicing (Buratowski, 2005; Millevoi and Vagner, 
2010; Kuehner et al, 2011). A recent study also implicates 
its role in initiation of transcription (Mapendano et al, 2010) . 
Pre-mRNA cleavage and polyadenylation, or mRNA polyade- 
nylation, is carried out by the 3' end processing complex which 
was recently found to comprise over 85 proteins in human 
cells (Shi et al, 2009) . Interestingly, this complex includes not 
only the well-known core polyadenylation factors, or polyA 
factors for simplicity, such as CPSF, CstF, CF Im, and CF Urn 
proteins, but also proteins with roles in DNA damage repair, 
transcription, splicing, translation, etc., underscoring the 
diverse connections between mRNA polyadenylation and 
other cellular processes. 

Over half of the human genes contain more than one polyA 
site (Tian et al, 2005; Yan and Marr, 2005), resulting in mRNA 
isoforms with different protein-coding regions and/or 3' 
untranslated regions (3'UTRs). The pattern of alternative 
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cleavage and polyadenylation, or alternative polyadenylation 
(APA), of genes is variable across tissues (Zhang et al, 2005; 
Wang et al, 2008), and is highly regulated during development 
and when cells change proliferation/differentiation states 
(Sandberg et al, 2008; Ji et al, 2009). In general, short 3'UTR 
isoforms resulting from usage of promoter-proximal polyA 
sites are relatively more abundant when cells are proliferative, 
transformed, or undifferentiated (Sandberg et al, 2008; Ji and 
Tian, 2009; Mayr and Bartel, 2009). Since 3'UTRs contain 
various cis elements involved in post-transcriptional gene 
regulation, such as mRNA localization, translation, and mRNA 
stability, APA can impact mRNA metabolism and protein 
expression level in the cytoplasm. Given that the predominant 
cis elements in 3'UTRs are those controlling mRNA stability, 
such as AU-rich elements, GU-rich elements, and microRNA 
target sites (Garneau et al, 2007; Vlasova et al, 2008; Bartel, 
2009) , it is conceivable that APA may have a widespread role in 
controlling mRNA half-life. Indeed, shortening of 3'UTRs has 
been shown to cause increased mRNA stability and higher 
protein expression for a number of oncogenes (Mayr and 
Bartel, 2009). 

Several mechanisms that regulate APA have been reported. 
First, modulation of specific polyA factors has been shown to 
alter polyA site choice. For example, upregulation of CstF64 
during B-cell maturation results in higher usage of a promoter- 
proximal polyA site in the IgM heavy chain gene (Takagaki 
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et al, 1996), and knockdown of the 25-kDa subunit of CF Im 
was shown to alter APA for a number of genes in HeLa cells 
(Kubo et al, 2006) . Consistently, a general inverse correlation 
between mRNA expression of polyA factors and global 3'UTR 
length was found in various tissues during development and in 
reprogrammed cells (Ji and Tian, 2009), indicating modulation 
of the general polyadenylation activity may be responsible for 
APA regulation in cell proliferation/differentiation. Second, 
various RNA binding proteins (RBPs) have been shown to 
modulate APA by interacting with cis elements adjacent to the 
polyA site (Millevoi and Vagner, 2010) . An emerging theme is 
that some RBPs previously known to regulate pre-mRNA 
splicing may also have roles in mRNA polyadenylation 
(Licatalosi and Darnell, 2010) . 

Third, some proteins with apparent functions in gene 
transcription have been shown to regulate APA, such as 
ELL2, an RNA polymerase II (Pol II) elongation factor 
(Martincic et al, 2009), and Cdc73, a component of the PAF 
protein complex (PAFc) which associates with Pol II (Rozen- 
blatt-Rosen et al, 2009). In addition, accumulating evidence 
suggests mRNA polyadenylation is extensively intertwined 
with transcription: Pol II itself is an essential polyA factor 
(Hirose and Manley, 1998) and its C-terminal domain (CTD) 
interacts with several other polyA factors and has been 
implicated in coupling pre-mRNA processing to transcription 
(McCracken et al, 1997; Ahn et al, 2004; Meinhart and Cramer, 
2004; Adamson etal, 2005; Zhang and Gilmour, 2006); several 
polyA factors interact with the basal transcriptional machinery 
(Dantonel et al, 1997) and are present at the promoter region of 
genes (Venkataraman et al, 2005; Glover-Cutter et al, 2008); 
and several transcriptional factors have been shown to 
regulate 3' end processing (Rosonina et al, 2003). 

Here, we present several lines of evidence indicating that 
polyA site usage is generally coupled to transcriptional 
activity, contributing to a global correlation between the 
relative abundance of 3'UTR isoforms and gene expression 
level. Given the roles of 3'UTR in mRNA metabolism, this 
mechanism coordinates transcriptional regulation with post- 
transcriptional control via pre-mRNA processing. 

Results 

A general correlation between relative abundance 
of APA isoforms and gene expression level in 
human and mouse tissues and cells 

APA can lead to mRNA isoforms with short or long 3'UTRs 
(Figure 1A). Using a paired-end RNA-seq data set recently 
released from Illumina (Supplementary Table 1), we examined 
relative expression of APA isoforms across 16 human tissues. 
Our method detects 3'UTR length changes based on compar- 
ison of the RNA-seq reads mapped to constitutive and 
alternative portions of 3'UTR (named cUTR and aUTR, 
respectively), as defined by the 5'-most polyA site in the 
3'UTR (Figure 1A). We developed a score named relative 
expression of isoforms using distal polyA sites (RUD) to 
represent the relative abundance of APA isoforms (see 
Materials and methods). 

Interestingly, we found that highly expressed genes in 
general tended to express short 3'UTR isoforms more 
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frequently than lowly expressed genes in all examined tissues 
(Figure IB; Supplementary Figure 1A). Overall, there were 
enrichments (~ 2-fold over expected values) of genes with 
preferential expression of short 3'UTR isoforms when they 
were highly expressed (lower right corner in Figure 1C) and of 
genes with preferential expression of long 3'UTR isoforms 
when they were lowly expressed (upper left corner in 
Figure 1C). Conversely, there were depletions (~ 3 -fold below 
expected values) of genes with opposite trends (upper right 
and lower left corners in Figure 1C). A similar result was 
obtained by analyzing transcriptomes in 10 human tissues 
based on the single-end RNA-seq method (Pan et al, 2008; 
Wang et al, 2008; Supplementary Figure IB and C). 

We next analyzed two Affymetrix exon array data sets 
corresponding to 62 types of human primary cells and cell 
lines and 54 types of mouse tissues and cell lines (Supple- 
mentary Table 1) , respectively. It is worth noting that data from 
Affymetrix exon arrays allow strand-specific analysis of gene 
expression, whereas the RNA-seq reads analyzed do not 
intrinsically reveal the strand information of transcription. 
This might be important for our analysis since previous reports 
have indicated that antisense transcripts are pervasive in 
human cells (He et al, 2008) and are involved in regulation of 
gene expression (Xu et al, 2011). Using RUD scores derived 
from microarray probe intensities (see Materials and methods 
and Figure 1A), we found that the connection between APA 
and gene expression is consistent with, but more obvious than, 
that observed using RNA-seq data, as indicated by gene 
enrichment and depletion values (Figure ID and E). Therefore, 
we conclude that there is a general correlation between the 
relative abundance of 3'UTR isoforms and gene expression 
level. 



Reporter assays confirmed regulation of polyA 
site choice by transcriptional activity 

Difference in relative abundance between APA isoforms can be 
attributable to two potential mechanisms: (1) 3'UTR isoforms 
have different mRNA stabilities and/or (2) isoforms are 
differentially produced resulting from alternative 3' end 
processing. The former mechanism has been shown for a 
number of genes (Edwalds-Gilbert etal, 1997; Mayr and Bartel, 
2009) . Thus, we wanted to know whether the latter could have 
a role in the correlation between relative abundance of APA 
isoforms and gene expression level. To this end, we made a set 
of constructs in which the same reporter gene, capable of 
expressing two APA isoforms, was under the control of 
different promoters (Figure 2A). This experimental design 
enables analysis of the effect of transcription on APA while 
minimizing the influence of mRNA stability. Using RNase 
protection assays (RPAs), we first confirmed the two expected 
APA isoforms (Figure 2B) . In addition, we found that the CMV 
promoter (Pcmv) resulted in much more (~ 9-fold) expression 
of the short isoform relative to the long one than a basal TATA- 
like promoter (P TA l) (Figure 2B), indicating that polyA site 
choice can be controlled by promoter sequences. 

We then used several inducible promoters and examined 
their regulation of polyA site choice under induced or non- 
induced conditions. Using quantitative reverse transcription 
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Figure 1 Gene expression level versus relative abundance of 3'UTR isoforms in human and mouse tissues and cells. (A) Schematic of APA. A gene with multiple 
polyA sites expresses isoforms with alternative 3'UTRs. Three polyA sites are shown. CDS, coding sequence; pA, polyA site; AAA n , poly(A) tail. The 3'UTR portion 
upstream of the first polyA site is called constitutive 3'UTR, or cUTR, and the downstream portion is called alternative 3'UTR, or aUTR. The ratio of RNA-seq read density 
or average microarray probe intensity of aUTR to that of cUTR is called the Relative expression of mRNA isoforms Using Distal polyA sites (RUD) score. (B) Average 
RUD scores for genes expressed at different levels across 16 human tissues (see Supplementary Figure 1 for plots of individual tissues). Genes were evenly divided into 
eight groups based on expression level. The average RUD score of genes in a group is plotted. Error bars represent 90% confidence intervals. P-value was based on T- 
test comparing highly expressed gene group (top 25%) with lowly expressed gene group (bottom 25%). (C) Gene density plot showing inverse correlation between RUD 
and gene expression level in 1 6 human tissues. Genes in all tissues were distributed in a 16 x 16 table, with columns corresponding to rank of gene expression level and 
rows to rank of RUD in 16 tissues. The number of genes in each cell of the table was normalized to an expected number derived from randomized data. Relative gene 
density, log 2 (Obs/Exp), where Obs is observed number of genes and Exp is expected number of genes, is represented in a heat map according to the color scheme 
shown in the figure. (D, E) As in (C), gene density plots showing inverse correlation between RUD and gene expression level in 62 human primary cells and cell lines (D) 
and 54 mouse tissues and cell lines (E). The data for (D) and (E) were based on Affymetrix GeneChip Exon Arrays. 



PCR (qRT-PCR) and constructs containing the cAMP response 
element (Pcre) or the NFkB binding site (P N fkb) in the 
promoter, which are inducible by forskolin or TNFoc, respec- 
tively, we found that induction of P CRE and P N fkb led to 
conspicuously higher expression of the reporter gene, as 
expected, and more usage of the promoter-proximal polyA site 
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(Figure 2C) . In contrast, induction of P N fkb by forskolin or P CRE 
by TNFa had no effect on gene expression nor polyA site 
usage, indicating specificity of the regulation (Figure 2C). This 
result was also confirmed by fluorescence activated cell 
sorting analysis (Supplementary Figure 2) . When the results 
for P TAL and P C mv were included, a good inverse correlation 
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Figure 2 Reporter assays indicate regulation of polyA site choice by transcriptional activity. (A) Constructs used in this study. P C mv, Ptal, Pcre, Pnfkb, Pgre, Phse, 
Psre- Pe2f> and P My0 G are various promoters (see Materials and methods for details). RFP, IRES, EGFP, and Kan r are sequences encoding red fluorescent protein, 
internal ribosome entry site, enhanced green fluorescent protein, and kanamycin resistance gene, respectively. Kan r has its own promoter and polyA site. As shown in 
the graph, two polyA sites (pA) resulted in two transcript isoforms (1 and 2). Isoform 1 encodes RFP, IRES, and EGFP; and isoform 2 encodes RFP only. AAA n , poly(A) 
tail. qRT-PCR primers and RNase protection assay (RPA) probes are indicated in the graph, which were used to examine relative expression of the two isoforms. 
(B) RPA analysis of the isoforms expressed from the construct with P C mv or P TAL . Top, a representative autoradiograph of RPA results. The RPA probe is 249 nt in 
length. The 200-nt fragment corresponds to isoform 1 and the 142-nt fragment to isoform 2. The amount of sample loaded in each lane was adjusted to make the overall 
signal similar across samples. Bottom, normalized molar ratios of isoform 2 to isoform 1 . The molar ratio of isoform 2 to isoform 1 was based on the amount of each RPA 
fragment quantified by Phosphorlmager and the number of uracils in each fragment (UTP was used for probe labeling). The value for P CM v was set to 1 , and error bars 
are standard deviation based on two experiments. (C) qRT-PCR analysis of expression level versus isoform ratio using cells transfected with constructs containing 
indicated promoters. Some of the transfected cells were treated with forskolin and/or TNFa as indicated in the graph. Expression level was measured by RFP/Kan r , and 
isoform ratio was measured by EGFP/RFP. fl 2 of linear regression is indicated. (D) qRT-PCR analysis of expression level and isoform ratio for constructs with P GRE , 
Phse, and P S re which were induced with dexamethasone, heat shock, and serum, respectively (see Materials and methods for details). R/K is RFP/Kan r and E/R is 
EGFP/RFP. Error bars are standard error of mean (s.e.m.) based on two experiments. (E) Percent of genes with 3'UTRs lengthened or shortened in three gene groups 
with different expression changes during differentiation of C2C12 cells. DN, downregulated; NC, no change; UP, upregulated. Error bars are standard deviation based on 
two samples; P-value is based on the Fisher's exact test comparing gene numbers in three groups. (F) qRT-PCR analysis of isoforms expressed from constructs with 
Pe2f or P My0 G in proliferating and differentiating C2C12 cells. Isoform ratio was measured by EGFP/RFP. Error bars are standard error of mean (s.e.m.) based on two 
experiments. See Materials and methods for more technical details. Source data is available for this figure in the Supplementary Information. 



between gene expression level and usage of the distal polyA 
site was discerned (# 2 =0.87). Similar trends were also 
observed for constructs containing promoters with the 
glucocorticoid response element (Pgre)> the heat-shock 
response element (P H se)> and the serum response element 
(?sre) (Figure 2D). Thus, our reporter assays confirmed that 
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short APA isoforms using promoter-proximal polyA sites are 
more likely to be produced when gene expression is induced, 
suggesting that polyA site usage is coupled to transcriptional 
activity. 

We next wanted to know whether the transcription-coupled 
APA can be detected when cells change conditions in response 

© 2011 EMBO and Macmillan Publishers Limited 



Regulation of 3' end processing by transcription 

Z Metal 



to developmental and environmental signals, under which the 
global APA pattern can change (Ji and Tian, 2009) . To this end, 
we first reanalyzed our previously published exon array data 
for mRNAs expressed in murine C2C12 myoblasts, in which 
genes tend to express long 3'UTR isoforms more frequently 
when cells switch from proliferation to differentiation (Ji et al, 
2009). Consistent with the notion that polyA site usage is 
coupled to transcriptional activity, we found that upregulated 
genes were less likely to have 3'UTRs lengthened than 
downregulated ones (Figure 2E) . This phenomenon was also 
observed when we analyzed data for breast cancer cell lines 
versus normal breast tissues and TNFot-treated lymphoblas- 
toid cells versus non-treated ones (Supplementary Figure 3). 
In both cases, upregulated genes were more likely to express 
short 3'UTR isoforms than downregulated ones regardless of 
the overall trend of APA regulation in the cell. To validate this 
global analysis result, we made a construct containing a 
promoter with the E2F binding site (P E 2f) and a construct 
containing the myogenin promoter (P My0 G)- Pe2f and P My0 G 
have been shown to be inhibited and activated, respectively, 
during C2C12 differentiation (Edmondson et al, 1992; Blais 
et al, 2005) . In complete agreement with our global analysis 
result, more expression of the long 3'UTR isoform relative to 
the short one was observed for the construct containing P E2F in 
differentiating C2C12 cells as compared with proliferating 
ones; and the construct containing P My0 G showed the opposite 
trend (Figure 2F) . Therefore, we conclude that modulation of 
polyA site choice by transcriptional activity can happen to 
genes with regulated expression when cells respond to 
developmental and environmental signals. 

To address whether our reporter assay results could be 
generalized, we made another construct (pTRE-RIF) contain- 
ing a different reporter gene and the tetracycline response 
element (TRE) in the promoter (Figure 3A), which can be 
activated by doxycycline (Dox) in HeLa Tet-On cells. Con- 
sistent with the results described above, activation of 
transcription resulted in higher gene expression and more 
usage of the proximal polyA site, as indicated both by 
luciferase assays (Figure 3B) and by qRT-PCR (Figure 3C). 
Since activation of transcription through TRE is mediated by 
the transcription activation domain from HSV VP16 
(Figure 3A), this result also suggests that transcription factors 
may have an important role in the regulation of polyA site 
choice by transcriptional activity (see Discussion for more on 
this point) . 



Nuclear run-on data support coupling of 3' end 
processing to transcription 

RNA polymerase II (Pol II) pauses at the polyA site for 3' end 
processing (Nag et al, 2007; Glover-Cutter et al, 2008; West and 
Proudfoot, 2009), which can be detected by the nuclear run-on 
(NRO) method (Core et al, 2008; West and Proudfoot, 2009). 
We reasoned that if polyA site usage was coupled to 
transcriptional activity, Pol II pausing at the polyA site would 
be different for genes expressed at different levels. To this end, 
we carried out NRO with pTRE-RIF under low and high 
induction conditions using BrUTP to label nascent transcripts. 
Labeled nascent transcripts were immunoprecipitated and 
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analyzed by qRT-PCR with primer sets targeting different 
regions of the reporter gene (Figure 3 A) . Consistent with the 
difference in total RNA expression, qRT-PCR of NRO RNA 
showed more nascent transcripts from the reporter gene when 
it was highly induced than when it was lowly induced 
(Figure 3D). 

The qRT-PCR value for a given region also represents Pol II 
density in the region, reflecting its pausing kinetics. As shown 
in Figure 3E, we found that Pol II density was generally higher 
across the whole reporter gene when it is highly induced. 
A prominent peak can be discerned at the proximal polyA site 
under both induction conditions, suggesting significant paus- 
ing of Pol II in the region. Importantly, the difference in Pol II 
density at the proximal polyA site between high and low 
induction conditions is significantly greater than those at other 
regions (Figure 3F). Combined with the luciferase results and 
qRT-PCR data using total RNA, the NRO results indicate that 
Pol II pausing at the polyA site (1) correlates with polyA site 
usage and (2) can be modulated by transcriptional activity. 
Since NRO is not affected by mRNA stability, a common issue 
in analysis of steady-state mRNAs, this finding directly 
supports the notion that polyA site usage is coupled to 
transcriptional activity. 

To examine regulation of 3' end processing by transcription 
for endogenous genes, we carried out a global NRO experiment 
using deep sequencing (GRO-seq), similar to the method 
developed by the Lis group (Core et al, 2008). We obtained 
over 12.5 million uniquely mapped strand-specific GRO-seq 
reads for the nascent transcripts generated in NRO of C2C12 
cells (see Materials and methods for details). We first 
examined genes with only one polyA site in the 3'-most exon 
(named single polyA site, Figure 4A). Consistent with the 
findings reported by the Lis group (Core et al, 2008), we 
observed two GRO-seq read peaks around the polyA site 
(Figure 4B): one spanned the 1-kilobase (kb) upstream region 
of the polyA site, peaking right before the polyA site; and the 
other spanned the 4-kb downstream region. As indicated 
previously (Core et al, 2008), the first peak corresponds to Pol 
II pausing at the polyA site, or polyA pausing, and the second 
one corresponds to Pol II pausing before termination, or pre- 
termination pausing. Using GRO-seq read density in the 
transcribed region to represent gene expression level (the first 

I kb region at the 5' end was excluded to minimize influence of 
reads resulted from Pol II pausing at the promoter), we found 
that polyA pausing was more pronounced for highly expressed 
genes than for lowly expressed ones (Figure 4B and PA/GB in 
Figure 4C). This trend was also discernable for the ratio of 
polyA pausing to pre-termination pausing (PA/PT in 
Figure 4C). Interestingly, we found a shift of the pre- 
termination pausing peak toward the polyA site as gene 
expression level increased (see arrows in Figure 4B, and 
comparison of the first and second halves of the pre- 
termination pausing region in Figure 4C) , suggesting that Pol 

II may terminate more rapidly on highly expressed genes than 
on lowly expressed ones. 

We next examined GRO-seq reads around alternative polyA 
sites. We focused on the 5'-most and 3'-most polyA sites in the 
3'-most exons (Figure 4D). Consistent with the notion that the 
promoter-proximal polyA site is more likely to be used when 
gene expression is high, both Pol II pausing at the proximal 
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Figure 3 Pol II pausing at the polyA site correlates with polyA site usage and is regulated by transcriptional activity. (A) The pTRE-RIF vector used in this study. P TRE , 
promoter containing the tetracycline response element; Dox, doxycycline; TetR, tetracycline repressor; VP16, the transcription activation domain of HSV VP16. Renilla, 
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on amplicons 5 and 6 as shown in (A). Error bars are standard deviation based on multiple amplicons. (D) qRT-PCR analysis of nascent RNA using cells treated with low 
or high doses of Dox. The Renilla value based on amplicons 1 , 2, and 3 was calculated, and its value for high induction was normalized to that for low induction (set to 1). 
(E) qRT-PCR analysis of NRO transcripts with different amplicons using cells treated with low or high doses of Dox. All amplicons were normalized to amplicon 1 (set to 
1). (F) Data in (E) were reanalyzed to show that Pol II pausing at the proximal polyA site (amplicon 4) had the biggest difference between high induction of expression 
and low induction of expression. Each amplicon was first normalized to amplicon 4, and then the normalized value for high induction was normalized to that for low 
induction. Thus, the low induction value is 1 for all amplicons. Source data is available for this figure in the Supplementary Information. 



polyA site (PA (proximal) /GB in Figure 4E) and the ratio of 
pausing between proximal and distal polyA sites (PA (prox- 
imal) /PA (distal) in Figure 4E) were much greater for highly 
expressed genes than for lowly expressed ones. Since 
measurement of Pol II pausing at the distal site may be 
affected by pre-termination pausing resulted from usage of 
upstream polyA sites, we calculated the ratio of pre-termina- 
tion pausing after the distal polyA site to gene body. 
Interestingly, the ratio was significantly lower for highly 
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expressed genes than for lowly expressed ones (PT (distal) /GB 
in Figure 4E), suggesting a greater loss of Pol II before reaching 
the distal polyA site for highly expressed genes. Taken 
together, our GRO-seq results based on endogenous genes 
further indicate that Pol II is more likely to pause at proximal 
polyA sites when genes are highly expressed. Notably, 
consistent results were also obtained using the data published 
by the Lis group, which was based on a human cell line 
(Supplementary Figure 4) . 
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Figure 4 Global nuclear run-on deep sequencing (GRO-seq) indicates that Pol II pausing at the polyA site is coupled to transcriptional activity. (A) Schematic of using 
GRO-seq data in this study. GRO-seq reads mapped to the sense strand of genes were divided into four groups: 5' reads, polyA site region (PA) reads, gene body (GB) 
reads, and pre-termination (PT) reads (see Materials and methods for details). (B) GRO-seq read density in the 3' end region of genes with a single polyA site. Read 
density was presented as reads per kilobase per million mapped reads, or RPKM. Expressed genes were divided into three groups based on expression level, that is, 
low, medium, and high. Only data for low and high groups are shown. Arrows in the figures indicate peaks in the PT region. (C) Ratio of GRO-seq read density between 
different regions for genes expressed at different levels. The comparing regions are indicated in each graph. Error bars are 90% confidence intervals and the P-values 
indicate difference between highly and lowly expressed genes (see Materials and methods for details). (D) GRO-seq read density around 5 / -most (proximal) and 3'-most 
(distal) polyA sites in the 3'-most exon of lowly expressed genes (top) and highly expressed genes (bottom). (E) As in (C), ratio of GRO-seq read density between 
different regions for genes expressed at different levels. The comparing regions are indicated in each graph. Only the 200-nt upstream region of the polyA site was used 
for the PA (proximal)/PA (distal) plot. 



Regulation of APA by transcriptional activity 
correlates with differences in nucleosome 
positioning and histone methylations around 
alternative polyA sites 

Transcriptional activity is known to impact epigenetic features 
such as nucleosome positioning and histone modifications 
(Barski et al, 2007; Campos and Reinberg, 2009; Schwartz et al, 
2009). We reasoned that alteration of epigenetic features at 
alternative polyA sites would indicate change of transcrip- 
tional activity at these sites, supporting alternative polyA site 
usage. Since epigenetic features are not affected by RNA 
metabolism, such as stability, the result would address the 
connection between polyA site usage and transcriptional 
activity from a different perspective. 

© 2011 EMBO and Macmillan Publishers Limited 



We first analyzed a data set generated by digestion of 
chromatin DNA from human resting T cells with micrococcal 
nuclease (Schones et al, 2008). Consistent with previous 
reports (Mavrich et al, 2008; Kaplan et al, 2009; Spies et al, 
2009), depletion of nucleosome around the polyA site was 
detected for both proximal and distal polyA sites (Figure 5A) . 
Interestingly, this depletion could be predicted using a 
computational model that was based solely on nucleotide 
content (Kaplan et al, 2009; Figure 5 A, bottom), indicating 
important contribution of nucleotide composition to nucleo- 
some positioning around the polyA site. However, highly 
expressed genes had a lower nucleosome level around the 
polyA site than lowly expressed genes, exceeding the 
difference predicted by the computational model, particularly 
in regions >200 nucleotides (nt) upstream or downstream of 
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Figure 5 Nucleosome positioning and histone modifications around alternative polyA sites in genes expressed at different levels. (A) Top, nucleosome levels around 
the proximal (left) and distal (right) polyA sites in genes expressed at different levels. Proximal and distal polyA sites are the 5'-most and 3'-most sites in the 3'-most exon, 
respectively. The nucleosome level was based on data from Schones et a/ (2008) using human resting T cells. Nucleosome level is the average number of reads mapped 
to a position relative to the polyA site normalized to the total mappable read number in the sample. Genes were divided into three groups based on expression level, that 
is, low, medium, and high, as indicated by different colored lines. Bottom, predicted nucleosome levels around proximal (left) and distal (right) polyA sites using the 
computational model reported by Kaplan et al (2009). (B) Ratio of nucleosome level in the 500-nt upstream region of proximal polyA site to that of distal site for genes 
expressed at different levels. (C) As in (B) except that predicted nucleosome levels were analyzed. Error bars are 90% confidence intervals and P-values indicate 
difference between highly and lowly expressed genes (see Materials and methods for details). (D) As in (A), expect that H3K36me3 (top) and H3K4me3 (bottom) levels 
around the proximal (left) and distal (right) polyA sites were plotted. H3K36me3 and H3K4me3 levels were based on data from Barski ef a/(2007) using human resting T 
cells. (E) As in (B), except that H3K36me3 levels were analyzed. (F) As in (B), except that H3K4me3 levels were analyzed. See also Supplementary Figure 5 for 
H3K36me1 , H3K4me1 , and H3K4me2 in resting T cells, and H3K36me3 and H3K4me3 in mouse embryonic fibroblasts and neuronal progenitor cells. 



the polyA site. This result indicates transcriptional activity has 
an additional impact on nucleosome level around the polyA 
site. Importantly, highly expressed genes had a lower ratio of 
nucleosome level between proximal and distal polyA sites 
(upstream 500 nt used for analysis) than lowly expressed 
genes (Figure 5B). Since this difference could not be discerned 
using predicted nucleosome levels (Figure 5C), the nucleo- 
some level difference between alternative polyA sites in genes 
expressed at different levels supports the notion that polyA site 
choice is connected to transcriptional activity. 

To examine how regulation of APA by transcriptional activity 
correlates with changes in histone methylation levels, we 
analyzed a ChlP-seq data set for human resting T cells. As 
expected, H3K36me3 and H3K4me3 levels around the polyA site 
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correlated with gene expression level (Figure 5D), although 
there are differences in their profiles. The H3K36me3 profiles 
showed a drop at the polyA site, presumably attributable to 
depletion of nucleosome. Significantly, the ratio of H3K36me3 
level in the upstream region of proximal polyA site (within 
500 nt) to that of distal site was significantly greater for highly 
expressed genes than that for lowly expressed ones (Figure 5E). 
This pattern was also corroborated by analysis of H3K36me3 
levels in mouse embryonic fibroblasts (MEFs) and neuronal 
progenitor cells (NPCs), but was not discernable for H3K36mel 
(Supplementary Figure 5) . 

Similar to the H3K36me3 result, the ratio of H3K4me3 level 
in the upstream region of proximal polyA site to that of distal 
site was significantly higher for highly expressed genes as 
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compared with lowly expressed ones (Figure 5F) . Interestingly, a 
marked drop of H3K4me3 level after the proximal polyA site can 
be discerned for highly expressed genes. The H3K4me3 result 
was also confirmed by using data for MEF and NPC (Supple- 
mentary Figure 5). A similar trend was found for H3K4me2, 
albeit less significant, but not for H3K4mel (Supplementary 
Figure 5). Taken together, our results from analysis of epigenetic 
features indicate polyA site choice is connected to transcrip- 
tional activity and leaves epigenetic signatures. 



Discussion 

Here, we report a general correlation between the relative 
abundance of APA isoforms and gene expression level in 
human and mouse transcriptomes, and present several lines of 
evidence indicating that transcriptional activity regulates 
polyA site choice. The correspondence between APA and gene 
expression revealed in our study may be responsible for the 
coupled usage of alternative promoters and polyA sites 
previously reported for some genes (Costessi et al, 2006; 
Winter et al, 2007), and contribute to tissue-specific and 
condition-specific APA events involving transcription factors, 
such as neuronal activity-dependent polyA site selection 
mediated by MEF2 (Flavell et al, 2008). 

Regulation of polyA site choice by transcriptional activity 
results in preferential expression of long 3'UTR isoforms when 
gene expression is low and short 3'UTR isoforms when gene 
expression is high. Since short 3'UTRs are generally more stable 
due to avoidance of destabilizing elements in 3'UTRs (Mayr and 
Bartel, 2009) and escape from cellular mechanisms degrading 
long 3'UTRs (Hogg and Goff, 2010), more frequent production of 
short 3'UTR isoforms would make the overall expression level of 
a gene even higher and, conversely, more frequent production of 
long 3'UTR isoforms would have the opposite effect. Therefore, 
this mechanism can magnify the final outcome of transcrip- 
tional regulation (Figure 6): high transcriptional activity leads to 
more expression of mRNA isoforms with long half-lives and 
therefore high protein production capabilities; and low tran- 
scriptional activity leads to more expression of isoforms with 
short half-lives, and therefore low protein production capabil- 
ities. Conceivably, this mechanism can facilitate swift gene 
expression changes, which can be critical when cells respond to 
developmental and environmental signals. 

Our result also suggests that downregulation of gene 
expression at the transcriptional level may lead to more 
unprocessed pre-mRNAs as a result of less usage of polyA site 
(Figure 6). This could lead to further inhibition of gene 
expression, because unprocessed pre-mRNAs are subject to 
degradation by the nuclear exosome (Houseley et al, 2006; 
Lykke- Anders en et al, 2009) . On this note, a global analysis of 
RNA expression by tiling arrays indicated that most regions of 
the human genome can be transcribed (Johnson et al, 2005) . 
However, intergenic transcripts outside the well-annotated 
genes generally have very low abundance (van Bakel et al, 
2010). It is possible that transcription of some of these RNA 
species may result from the non-specific activity of Pol II, 
which, due to lack of coupling to 3' end processing, leads to 
unprocessed transcripts that are rapidly degraded. Therefore, 
coupling 3' end processing to transcription may have a role in 
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Figure 6 A model for regulation of APA by transcription and its impact on gene 
expression. A hypothetical gene is shown as a box with the transcription start site 
(TSS) and polyA sites (pA) indicated. High and low transcriptional activities are 
indicated by thick and thin curved arrows, respectively. Pol II is shown as a yellow 
oval. High and low polyA site usage is indicated by large and small lightning 
symbols. This model shows that high transcriptional activity leads to more polyA 
site usage, resulting in relatively higher expression of short 3'UTR isoforms 
whereas low transcriptional activity leads to lower polyA site usage, resulting in 
either relative higher expression of long 3'UTR isoforms or unprocessed pre- 
mRNAs that are subject to degradation in the nucleus. Since alternative 3'UTRs 
typically contain destabilizing elements, as indicated by various previous studies 
(Mayr and Bartel, 2009; Hogg and Goff, 2010), short 3'UTR isoforms are more 
stable and have higher protein expression capability than long isoforms. 



augmenting the difference between specifically activated 
transcription and transcriptional noise in the cell. 

A number of potential molecular mechanisms need to be 
considered to explain regulation of polyA site choice by 
transcriptional activity. First, transcription elongation, which 
can be controlled by gene promoters/enhancers, has been 
shown to regulate pre-mRNA splicing (Kornblihtt, 2007). 
However, unless the elongation rate constantly correlates with 
transcriptional activity, it is not likely to be a general mechanism 
responsible for global regulation of polyA site choice at different 
gene expression levels. Second, previous studies have shown 
that the CTD of Pol II is critical for coupling pre-mRNA 
processing to transcription (McCracken et al, 1997; Ahn et al, 
2004; Meinhart and Cramer, 2004; Adamson et al, 2005). 
Phosphorylation of CTD is important for recruitment of polyA 
factors to the 3' end of genes (Ahn et al, 2004). It remains to be 
seen, however, whether highly expressed genes have different 
CTD phosphorylation patterns than lowly expressed ones, 
leading to differential recruitment of polyA factors at the 3' end. 

Third, transcription factors can regulate mRNA polyadeny- 
lation. For example, some transcription factors were shown to 
stimulate PSF (Rosonina et al, 2005) , which has a role in 3' end 
processing and termination of transcription (Liang and Lutz, 
2006; Kaneko et al, 2007). Notably, Nagaike et al (2011) have 
recently provided biochemical evidence in vitro that transcrip- 
tion activators stimulate transcription-coupled 3' end proces- 
sing through direct interaction with the PAFc, which was 
previously shown to be involved in 3' end formation of yeast 
mRNAs and snRNAs (Penheiter et al, 2005; Sheldon et al, 
2005), and to bind CPSF and CstF (Rozenblatt-Rosen et al, 
2009). Interestingly, PAFc has also been shown to regulate 
histone methylation (Zhu et al, 2005; Jaehning, 2010), which 
appears consistent with our finding that the relative levels of 
H3K4me3 and H3K36me3 around alternative polyA sites are 
distinct for genes expressed at different levels. Therefore, it is 
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plausible that the correspondence between APA and gene 
expression we observed here is due to recruitment of polyA 
factors to Pol II at the promoter region by transcription factors. 
This mechanism would make Pol II more 'prepared' to engage in 
3' end processing when a polyA site is encountered. Future 
studies need to further address whether different transcriptional 
factors involve different mechanisms in polyA factor recruitment. 

The correlation between expression level and differences in 
the patterns of nucleosome positioning and histone methylation 
(H3K4me3 and H3K36me3) around alternative polyA sites not 
only supports the notion that polyA site choice is regulated by 
transcriptional activity but also raises the question as to whether 
epigenetic features can, in return, modulate APA, thereby 
forming a feedback regulatory mechanism. Histone methylation 
has been shown to facilitate pre-mRNA splicing by recruiting the 
spliceosome (Sims et al, 2007) and to regulate alternative 
splicing by affecting splicing regulators (Luco et al, 2010) . DNA 
methylation, another type of epigenetic mark, has been shown 
to regulate APA in imprinted genes (Wood et al, 2008) . It will be 
interesting in the future to examine how nucleosome remodel- 
ing and different types of histone methylation have roles in APA 
and its coupling to transcription. 

Materials and methods 

Data sets 

Information about the data sets used in this study is listed in 
Supplementary Table 1. PolyA site information was obtained from 
PolyA_DB (Lee et al, 2007). 



Analysis of RNA-seq data 

Paired-end reads were mapped to hgl8 genome using Tophat (Trapnell 
et al, 2009) with default parameters. Only uniquely mapped and 
properly paired reads were used for subsequent analyses. Properly 
paired reads are those with two pairing reads mapped to different 
strands of the same chromosome. See Supplementary Table 2 for the 
number of reads used. Gene expression levels were calculated using 
read density in the protein-coding region based on the reads per 
kilobase of mappable region per million mapped reads (RPKM) 
method (Mortazavi et al, 2008) . A cutoff of RPKM=1 was used to select 
expressed genes, which resulted in similar numbers of expressed genes 
in different tissues to those reported by Ramskold et al (2009) . The top 
25 % , middle 50 % , and bottom 25 % of expressed genes with respect to 
RPKM were considered as having high, medium, and low expression, 
respectively. The score for relative expression of isoforms using distal 
polyA sites (RUD) was based on the ratio of read density in aUTR to 
that in cUTR (Figure 1 A) . Therefore, a high RUD value indicates higher 
abundance of long 3'UTR isoform resulting from usage of promoter- 
distal polyA sites relative to short 3'UTR isoforms resulting from usage 
of promoter-proximal polyA sites. The transcription direction for each 
read pair was inferred based on overlap with RefSeq sequences and the 
3' end of the mapped read pair was used for RUD calculation. We 
required the 3'UTR of a surveyed gene do not overlap with any regions 
of other genes, regardless of transcription direction. This can minimize 
interference from antisense transcripts. 



Analysis of exon array data 

Exon array data were normalized by the Robust Multichip Average 
(RMA) method and were corrected for hybridization bias using the 
COSIE program (Gaidatzis et al, 2009). Expressed genes were selected 
by the Detection Above Background (DABG) method. Gene expression 
level was calculated using probesets in constitutive exons, based on 
NCBI cDNAs/ESTs (Lee et al, 2008). The RUD score was based on the 



ratio of average probeset intensity in aUTR to that in cUTR, as 
previously described (Ji et al, 2009) . For both human and mouse data 
sets, we first calculated mean for each probeset across replicates and 
then combined all samples for analysis. For gene expression regulation 
in differentiation of C2C12 cells, 1 x standard deviation of log 2 (ratio of 
expression) was used to group differentially expressed genes. APA 
regulation was based on analysis of RUD values using the Significance 
Analysis of Microarray (SAM) method with FDR < 0.05 as cutoff for 
selection of significant events, as previously described (Ji et al, 2009). 



Analysis of GRO-seq data 

GRO-seq reads for C2C12 cells were mapped to the mouse genome 
(mm9) using Bowtie (Langmead et al, 2009) allowing up to three 
mismatches. Unmapped reads were trimmed to the first 38 nt and 
mapped to the genome again by Bowtie allowing up to three 
mismatches. This approach resulted in 12 511052 uniquely mapped 
reads (69% of total). The 5' end position of each read was used to 
indicate its location. GRO-seq reads were examined in four regions of a 
gene: (1) 5' region, 1 kb downstream of the transcription start site 
(TSS); (2) polyA region (PA), 1 kb upstream of the polyA site; (3) gene 
body region (GB), whole gene region excluding 5' and PA regions; 
(4) pre-termination region (PT) , 4 kb downstream of the polyA site. 
When there were multiple polyA sites in the 3'-most exon, the 5'-most 
site was used to define GB. Gene expression level was based on read 
density (RPKM) in GB and PA. RPKM > 0.04 was used as cutoff to select 
expressed genes. The top 25%, middle 50%, and bottom 25% of 
expressed genes with respect to RPKM were considered as having high, 
medium, and low expression, respectively. To minimize interference 
from downstream genes, we selected only genes that were not 
followed by any RefSeq-supported genes with the same transcriptional 
direction in the 6-kb downstream region. For analysis of alternative 
polyA sites, we selected only the distal polyA sites that were not 
preceded by any polyA sites in the upstream 400 nt region and used 
only the upstream 200 nt region of proximal or distal polyA sites. The 
same method was used to analyze the GRO-seq data generated by the 
Lis group for IMR90 cells, except that the PT region was 3 kb. 

Analysis of nucleosome positioning and histone 
methylation patterns 

The data for nucleosome positioning and histone methylation were 
from Schones et al (2008) and Barski et al (2007), respectively. For the 
nucleosome positioning data, we extended reads to 147 bp, the length 
of DNA bound by a full nucleosome. Gene expression levels were 
based on microarray data of human T cells using the Affymetrix MAS5 
method. The bottom and top 20% of genes with respect to probe 
intensity values were considered as lowly and highly expressed genes, 
respectively. The ratio of read density in the 500-nt upstream region of 
proximal polyA site to that of distal site was calculated to indicate 
relative level at the proximal versus distal polyA sites. Nucleosome 
level prediction using nucleotide content was based on a program from 
http://genie.weizmann.ac.il/software/nucleo_genomes.html (Kaplan 
et al, 2009). Histone modification patterns in MEFs and NPCs were 
analyzed by the same method. 



Statistical analysis 

To compare various features between highly and lowly expressed genes, 
we used a data resampling method based on bootstrapping (Venables and 
Ripley, 2002) . This was carried out by resampling genes in two comparing 
groups 1000 times to derive a P-value based on how many times one 
group has a higher or lower mean value than the other. This method was 
also used to derive 90% confidence intervals for various data. 



Gene density plot 

We used the gene density plot to examine the correlation between the 
relative abundance of 3'UTR isoforms and gene expression level. As 
described above, the RUD score was used to represent the relative 
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abundance of APA isoforms. In the gene density plot, genes expressed 
in each sample were summarized in an N x N table according to the 
RUD rank (row) and gene expression rank (column) across a sample 
set, where N is the number of samples in the set. The number of genes 
in each cell of the table is an observed value. The expected number of 
genes for each cell was calculated using randomized data with shuffled 
RUD and gene expression ranks. The ratio of observed number of 
genes (Obs) to expected number of genes (Exp) of a cell indicates the 
extent of enrichment (when ratio >1), or of depletion (when ratio 
<1), of genes in the cell; and the log 2 (ratio) values of the table are 
represented in a heat map using R. 



Constructs used in this study 

Constructs containing P TAL , P CRE , P NFk b> Pgre, Phse, and P S re were 
constructed by replacing the CMV promoter (P C mv) in pRiG-77S.AD (Ji 
et al, 2009) with fragments containing promoter sequences from 
corresponding constructs included in the Mercury™ Pathway Profiling 
System (Clontech) by PCR (primers: 5'-GCGCATTAATGAGCTCTTA 
CGCGTTCTAGC and 5'-CGATGCTAGCCGATTCGAAGCTTCTGCTTC) 
and restriction enzymes Asel and Nhel. Constructs containing P E2 f 
and P My0 G were constructed by replacing the CMV promoter in pRiG- 
77S.AE (Ji et al, 2009) with respective DNA fragments and restriction 
enzymes Asel and Nhel. Pm Y og was derived from the myogenin 
promoter (-395 to +39nt surrounding the TSS; Edmondson et al, 
1992) using PCR (primers: S'-CGATATTAATGGATTTTCAAGACCCC 
TTCC and 5'-GGCCGCTAGCAAGGCTTGTTCCTGCCACT) and C2C12 
genomic DNA, and P E 2f was derived from the E2F luciferase reporter 
vector (Panomics) using PCR (primers: 5'-CGATATTAATCTAGCC 
TTGGCGGGAGATA and 5'-GGCCGCTAGCTTACCAACAGTACCGGAAT 
GC). The pTRE-RIF vector was constructed by cloning a fragment 
encoding the Renilla luciferase from pRL-CMV (Promega), a fragment 
encoding the firefly luciferase from pGL3-Basic (Promega), and a 
fragment containing a polyA site and an IRES sequence from pRiG- 
77S.AD into the pTRE-Tight vector (Clontech). 



pcDNA3.1 vector (Invitrogen) using restriction enzymes (Xhol and 
BamHl). The 32 P-labeled antisense RNA probe was generated by 
in vitro transcription using the MAXIscript kit (Ambion) . RPA assays 
were carried out using the RPA III kit (Ambion) . 



Quantitative real-time reverse transcription PCR 

For qRT-PCR, total cellular RNA was treated with DNase I and reverse 
transcribed using the oligo-dT primer. qRT-PCR was carried out using 
the Maxima SYBR Green/Rox qPCR Master Mix (Fermentas) with 
primers targeting RFP (5'-GCCCCGTAATGCAGAAGAAG and 5'-CTTC 
AGGGCCTTGTGGATCT), EGFP (5'-GGGCACAAGCTGGAGTACAACT 
and 5'-ATGTTGTGGCGGATCTTGAAG), or the kanamycin resistance 
gene (5'-GCCGAATATCATGGTGGAAA and 5'-AATATCACGGGTAGCC 
AACG). 



NRO assay 

NRO assays using BrUTP to label newly synthesized RNA were based 
on the methods developed by the Lis group (Core et al, 2008) and the 
Fu group (Lin et al, 2008) . Briefly, ~ 1 x 10 7 cells were washed three 
times in cold PBS on ice, followed by incubation in 10 ml ice-cold 
swelling buffer (10 mM Tris-HCl pH 7.5, 3 mM CaCl 2 , 2 mM MgCl 2 ) for 
5 min. Cells were collected with a scraper and pelleted with 500 g at 4°C 
for 10 min. Nuclei were isolated by pipetting cells up and down 20 
times using a cut P1000 tip in 1 ml lysis buffer (swelling buffer with 
0.5 % NP-40, 10 % glycerol, 20 U/ml RNasin) . Nuclei were washed and 
pelleted in the lysis buffer, resuspended in 1 ml freezing buffer (50 mM 
Tris-HCl pH 8.3, 40% glycerol, 5mM MgCl 2 , 0.1 mM EDTA). Nuclei 
were pelleted again with 1000 g at 4°C for 5 min, and resuspended in 
100 ul of freezing buffer. An equal volume (100 ul) of reaction buffer 
(10 mM Tris-HCl pH 8.0, 5 mM MgCl 2 , 1 mM DTT, 300 mM KC1, 20 U of 
RNase inhibitor, 1% sarkosyl, 0.5 mM of BrUTP, ATP, and GTP, and 
5 uM CTP) was added to carry out NRO at 30°C for 5 min. 



Cell culture and reporter assays 

Experiments using constructs containing P CMV , P T al, ^cre, ^nfkb, ^gre, 
P H se> or P SRE were carried out in Human Embryonic Kidney (HEK) 293 
cells, which were maintained in Dulbecco's Modified Eagles Medium 
(DMEM) supplemented with 10 % fetal bovine serum (FBS) . Transfection 
was carried out using Lipofectamine 2000 (Invitrogen). The following 
conditions were used to induce P CRE , Pnf k b> and P GRE : 5 uM forskolin 
(Fisher) for P CRE , 0.1 ug/ml human TNFa (Sigma) for P NFkB , and 5 uM 
Dexamethasone (Sigma) for P G r E . Cells were treated with inducing agents 
16 h after transfection. Six hours after treatment, total cellular RNA was 
extracted using TRIzol (Invitrogen) . To induce P H se> cells were incubated 
at 45 °C for 15 min. To induce P S r E , cells were first grown in the reduced 
serum media Opti-MEM (Invitrogen) and then in DMEM with 10% FBS. 

Constructs containing P E2F or P My0 G were studied in proliferating and 
differentiating C2C12 cells. Briefly, C2C12 cells were maintained at 30- 
70% confluency in DMEM supplemented with 10% FBS. Transfection 
was carried out using Lipofectamine 2000 (Invitrogen) when the 
confluency of cells was 30 % . After 16 h, cells were split into a proliferation 
group and a differentiation group. For the proliferation group, cells were 
still maintained in DMEM with 10% FBS. For the differentiation group, 
cells were switched to DMEM with 2 % horse serum when they reached 
90 % confluency. qRT-PCR was carried out 24 h after transfection. 

pTRE-RIF was studied in HeLa Tet-On cells (gift from Andrew 
Harris, UMDNJ) . Briefly, cells were maintained in DMEM supplemen- 
ted with 10% FBS and 200 ug/ml G418. Transfection of pTRE-RIF was 
carried out using jetPEI (Polyplus-transfection) . Cells were treated 
with different doses of Dox (Sigma) right after transfection. After 24 h, 
cell lysis and luciferase assay were carried out using the Dual- 
Luciferase Reporter Assay System (Promega) . 

RNase protection assay 

The DNA template for RPA probe was produced by putting a fragment 
surrounding the proximal polyA site of pRiG-77S.AD into the 



NRO reporter assay 

For NRO of reporter constructs, transfected HeLa Tet-On cells were 
treated with 10 ng/ml or 10 |!g/ml Dox. After 16 h, cells were harvested 
for NRO as described above. The NRO reaction was stopped by adding 
Trizol into the reaction mix. RNA was then extracted, and treated with 
DNase I to remove DNA. Newly synthesized labeled RNA was pulled 
down by anti-BrdU antibody conjugated to the protein G Dynabeads 
(Invitrogen) in the binding buffer (10 mM Tris-HCl pH 7.4, 500 mM 
NaCl, 2.5 mM MgCl 2 , 0.5% Triton X-100, 0.5 ug/ul yeast RNA, 0.1 ug/ 
ul yeast tRNA) at 4°C for 2h. After immunoprecipitation (IP), beads 
were washed with the washing buffer (10 mM Tris-HCl pH 7.4, 
500 mM NaCl, 2.5 mM MgCl 2 , 0.5% Triton X-100) six times. RNA was 
eluted from the beads by the RLT buffer (Qiagen) supplemented with 
yeast RNA (0.5 ug/ul). RNA was then purified by the RNeasy kit 
(Qiagen) and used for qRT-PCR with random hexamers as primer for 
the RTstep. Primers used for qRT-PCR of NRO transcripts are shown in 
Supplementary Table 3. 



GRO-seq 

C2C12 cells grown to ~80% confluency in DMEM + 10% FBS were 
used for NRO, as described above. After NRO, the reaction mix was 
treated with 50 U DNase I at 37°C for 1 h followed by incubation at 55°C 
for 1 h with an equal volume of buffer S containing 20 mM Tris-HCl pH 
7.4, 2% SDS, 10 mM EDTA, 200mg/ml Proteinase K. RNA was 
extracted by phenol/chloroform twice, followed by ethanol precipita- 
tion. Purified RNA was fragmented to ~100nt by the RNA 
Fragmentation Reagents kit (Ambion) at 70°C for 15 min, and was 
subjected to IP with the anti-BrdU antibody (2 mg/reaction, Sigma) 
conjugated on the protein G Sepharose (GE Healthcare). Immunopre- 
cipitated RNA was extracted by phenol/chloroform and ethanol 
precipitation. The IP step was repeated twice to obtain pure nascent 
RNA. Purified nascent RNA was treated with shrimp alkaline 
phosphatase (Roche) at lU/reaction at 37°C for 30 min to remove 
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the 3' phosphate group from RNA, followed by addition of the 5' 
phosphate group with T4 kinase (NEB) at 10 U/reaction at 37°C for 1 h. 
Treated RNA was purified and used to prepare a sequencing library 
using the Illumina Small RNA Sample Prep Kit (vl.5). Deep 
sequencing was carried out on an Illumina Genome Analyzer IIx, 
which generated over 12.5 million uniquely mapped single reads 
(76 nt). 



Supplementary information 

Supplementary information is available at the Molecular Systems 
Biology website (www.nature.com/msb). 
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